Exploratory Data Analysis of Software Repositories via GPU Processing
نویسندگان
چکیده
Analyzing software repositories with thousands of artifacts is data intensive, which makes interactive exploration analysis of such data infeasible. We introduce a novel approach, Dominoes, that can support automated exploration of relationships amongst project elements, where users have the flexibility to explore on the fly the numerous types of project relationships. Dominoes organizes data extracted from software repositories into multiple matrices that can be treated as domino pieces (e.g., [commit|method]). It allows connecting such pieces based on a set of matrix operations to derive additional domino pieces. These derived domino pieces represent semantics on project entity relationships (e.g., number of commits in which two methods co-occurred) and can be used for further explorations. This opens a vast possibility of data analysis, since these domino pieces can be iteratively combined. Our proposed matrix representation and operations allow for fast and efficient processing of a large volume of data by using a highly parallel architecture, such as GPUs. KeywordsExploratory data analysis; software dependencies; GPU computing
منابع مشابه
Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU
Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...
متن کاملImplementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملPerformance/Power Design Space Exploration and Analysis for GPU Based Software
Recently, there have been tremendous interests in the acceleration of general computing applications using a Graphics Processing Unit (GPU). Now the GPU provides the computing powers not only for fast processing of graphics applications, but also for general computationally complex data intensive applications. On the other hand, power and energy consumptions are also becoming important design c...
متن کاملData analysis and bioinformatics tools for tandem mass spectrometry in proteomics.
Data processing is a central and critical component of a successful proteomics experiment, and is often the most time-consuming step. There have been considerable advances in the field of proteomics informatics in the past 5 years, spurred mainly by free and open-source software tools. Along with the gains afforded by new software, the benefits of making raw data and processed results freely av...
متن کاملVirtualization Infrastructure ( VMware ESXi & NVidia GPU ) ASTOR WEB - Portal VMs
Modern data analysis applications for 2D/3D data samples require complex visual output features which are often based on OpenGL, a multi-platform API for rendering vector graphics. They demand special computing workstations with a corresponding CPU and GPU power, enough main memory and fast network interconnects for a performant remote data access. For this reason, users depend heavily on avail...
متن کامل